Goto

Collaborating Authors

 mean squared error


Appendix

Neural Information Processing Systems

In this appendix, we first introduce the datasets and evaluation metrics used in the experiments in Section A. Then, we provide extra experimental results in Section B. In Section C, we present details of network design, training scheme, and hyper-parameter tuning. We conduct experiments on 11 popular time series datasets: (1) Electricity Transformer Temperature [42] (ETTh(1,2),ETTm1) 3consists of 2 year electric power data collected from two separated counties of China. Each data point includes an "oil temperature" value and 6 power load features. The data is aggregated into 5-minutes windows, resulting in 12 points per hour and 288 points per day. A.1 Electricity Transformer Temperature (ETT) For data pre-processing, we perform zero-mean normalization, i.e., X We use Mean Absolute Errors (MAE) [17] and Mean Squared Errors (MSE) [26] for model comparison.


Adaptive Normalization Mamba with Multi Scale Trend Decomposition and Patch MoE Encoding

arXiv.org Artificial Intelligence

Time series forecasting in real world environments faces significant challenges non stationarity, multi scale temporal patterns, and distributional shifts that degrade model stability and accuracy. This study propose AdaMamba, a unified forecasting architecture that integrates adaptive normalization, multi scale trend extraction, and contextual sequence modeling to address these challenges. AdaMamba begins with an Adaptive Normalization Block that removes non stationary components through multi scale convolutional trend extraction and channel wise recalibration, enabling consistent detrending and variance stabilization. The normalized sequence is then processed by a Context Encoder that combines patch wise embeddings, positional encoding, and a Mamba enhanced Transformer layer with a mixture of experts feed forward module, allowing efficient modeling of both long range dependencies and local temporal dynamics. A lightweight prediction head generates multi horizon forecasts, and a denormalization mechanism reconstructs outputs by reintegrating local trends to ensure robustness under varying temporal conditions. AdaMamba provides strong representational capacity with modular extensibility, supporting deterministic prediction and compatibility with probabilistic extensions. Its design effectively mitigates covariate shift and enhances predictive reliability across heterogeneous datasets. Experimental evaluations demonstrate that AdaMamba's combination of adaptive normalization and expert augmented contextual modeling yields consistent improvements in stability and accuracy over conventional Transformer based baselines.


xLSTMAD: A Powerful xLSTM-based Method for Anomaly Detection

arXiv.org Artificial Intelligence

The recently proposed xLSTM is a powerful model that leverages expressive multiplicative gating and residual connections, providing the temporal capacity needed for long-horizon forecasting and representation learning. This architecture has demonstrated success in time series forecasting, lossless compression, and even large-scale language modeling tasks, where its linear memory footprint and fast inference make it a viable alternative to Transformers. Despite its growing popularity, no prior work has explored xLSTM for anomaly detection. In this work, we fill this gap by proposing xLSTMAD, the first anomaly detection method that integrates a full encoder-decoder xLSTM architecture, purpose-built for multivariate time series data. Our encoder processes input sequences to capture historical context, while the decoder is devised in two separate variants of the method. In the forecasting approach, the decoder iteratively generates forecasted future values xLSTMAD-F, while the reconstruction approach reconstructs the input time series from its encoded counterpart xLSTMAD-R. We investigate the performance of two loss functions: Mean Squared Error (MSE), and Soft Dynamic Time Warping (SoftDTW) to consider local reconstruction fidelity and global sequence alignment, respectively. We evaluate our method on the comprehensive TSB-AD-M benchmark, which spans 17 real-world datasets, using state-of-the-art challenging metrics such as VUS-PR. In our results, xLSTM showcases state-of-the-art accuracy, outperforming 23 popular anomaly detection baselines. Our paper is the first work revealing the powerful modeling capabilities of xLSTM for anomaly detection, paving the way for exciting new developments on this subject. Our code is available at: https://github.com/Nyderx/xlstmad


Conditional Neural ODE for Longitudinal Parkinson's Disease Progression Forecasting

arXiv.org Artificial Intelligence

Parkinson's disease (PD) shows heterogeneous, evolving brain-morphometry patterns. Modeling these longitudinal trajectories enables mechanistic insight, treatment development, and individualized 'digital-twin' forecasting. However, existing methods usually adopt recurrent neural networks and transformer architectures, which rely on discrete, regularly sampled data while struggling to handle irregular and sparse magnetic resonance imaging (MRI) in PD cohorts. Moreover, these methods have difficulty capturing individual heterogeneity including variations in disease onset, progression rate, and symptom severity, which is a hallmark of PD. To address these challenges, we propose CNODE (Conditional Neural ODE), a novel framework for continuous, individualized PD progression forecasting. The core of CNODE is to model morphological brain changes as continuous temporal processes using a neural ODE model. In addition, we jointly learn patient-specific initial time and progress speed to align individual trajectories into a shared progression trajectory. We validate CNODE on the Parkinson's Progression Markers Initiative (PPMI) dataset. Experimental results show that our method outperforms state-of-the-art baselines in forecasting longitudinal PD progression.


Distributionally Robust Feature Selection

arXiv.org Artificial Intelligence

We study the problem of selecting limited features to observe such that models trained on them can perform well simultaneously across multiple subpopulations. This problem has applications in settings where collecting each feature is costly, e.g. requiring adding survey questions or physical sensors, and we must be able to use the selected features to create high-quality downstream models for different populations. Our method frames the problem as a continuous relaxation of traditional variable selection using a noising mechanism, without requiring backpropagation through model training processes. By optimizing over the variance of a Bayes-optimal predictor, we develop a model-agnostic framework that balances overall performance of downstream prediction across populations. We validate our approach through experiments on both synthetic datasets and real-world data.


One Whisper to Grade Them All

arXiv.org Artificial Intelligence

We present an efficient end-to-end approach for holistic Automatic Speaking Assessment (ASA) of multi-part second-language tests, developed for the 2025 Speak & Improve Challenge. Our system's main novelty is the ability to process all four spoken responses with a single Whisper-small encoder, combine all information via a lightweight aggregator, and predict the final score. This architecture removes the need for transcription and per-part models, cuts inference time, and makes ASA practical for large-scale Computer-Assisted Language Learning systems. Our system achieved a Root Mean Squared Error (RMSE) of 0.384, outperforming the text-based baseline (0.44) while using at most 168M parameters (about 70% of Whisper-small). Furthermore, we propose a data sampling strategy, allowing the model to train on only 44.8% of the speakers in the corpus and still reach 0.383 RMSE, demonstrating improved performance on imbalanced classes and strong data efficiency.


FHRFormer: A Self-supervised Transformer Approach for Fetal Heart Rate Inpainting and Forecasting

arXiv.org Artificial Intelligence

Approximately 10\% of newborns require assistance to initiate breathing at birth, and around 5\% need ventilation support. Fetal heart rate (FHR) monitoring plays a crucial role in assessing fetal well-being during prenatal care, enabling the detection of abnormal patterns and supporting timely obstetric interventions to mitigate fetal risks during labor. Applying artificial intelligence (AI) methods to analyze large datasets of continuous FHR monitoring episodes with diverse outcomes may offer novel insights into predicting the risk of needing breathing assistance or interventions. Recent advances in wearable FHR monitors have enabled continuous fetal monitoring without compromising maternal mobility. However, sensor displacement during maternal movement, as well as changes in fetal or maternal position, often lead to signal dropouts, resulting in gaps in the recorded FHR data. Such missing data limits the extraction of meaningful insights and complicates automated (AI-based) analysis. Traditional approaches to handle missing data, such as simple interpolation techniques, often fail to preserve the spectral characteristics of the signals. In this paper, we propose a masked transformer-based autoencoder approach to reconstruct missing FHR signals by capturing both spatial and frequency components of the data. The proposed method demonstrates robustness across varying durations of missing data and can be used for signal inpainting and forecasting. The proposed approach can be applied retrospectively to research datasets to support the development of AI-based risk algorithms. In the future, the proposed method could be integrated into wearable FHR monitoring devices to achieve earlier and more robust risk detection.


Magnitude Matters: a Superior Class of Similarity Metrics for Holistic Semantic Understanding

arXiv.org Artificial Intelligence

Vector comparison in high dimensions is a fundamental task in NLP, yet it is dominated by two baselines: the raw dot product, which is unbounded and sensitive to vector norms, and the cosine similarity, which discards magnitude information entirely. This paper challenges both standards by proposing and rigorously evaluating a new class of parameter-free, magnitude-aware similarity metrics. I introduce two such functions, Overlap Similarity (OS) and Hyperbolic Tangent Similarity (HTS), designed to integrate vector magnitude and alignment in a more principled manner. To ensure that my findings are robust and generalizable, I conducted a comprehensive evaluation using four state-of-the-art sentence embedding models (all-MiniLM-L6-v2, all-mpnet-base-v2, paraphrase-mpnet-base-v2, and BAAI/bge-large-en-v1.5) across a diverse suite of eight standard NLP benchmarks, including STS-B, SICK, Quora, and PAWS. Using the Wilcoxon signed-rank test for statistical significance, my results are definitive: on the tasks requiring holistic semantic understanding (paraphrase and inference), both OS and HTS provide a statistically significant improvement in Mean Squared Error over both the raw dot product and cosine similarity, regardless of the underlying embedding model.Crucially, my findings delineate the specific domain of advantage for these metrics: for tasks requiring holistic semantic understanding like paraphrase and inference, my magnitude-aware metrics offer a statistically superior alternative. This significant improvement was not observed on benchmarks designed to test highly nuanced compositional semantics (SICK, STS-B), identifying the challenge of representing compositional text as a distinct and important direction for future work.


Knowledge-Guided Adaptive Mixture of Experts for Precipitation Prediction

arXiv.org Artificial Intelligence

Accurate precipitation forecasting is indispensable in agriculture, disaster management, and sustainable strategies. However, predicting rainfall has been challenging due to the complexity of climate systems and the heterogeneous nature of multi-source observational data, including radar, satellite imagery, and surface-level measurements. The multi-source data vary in spatial and temporal resolution, and they carry domain-specific features, making it challenging for effective integration in conventional deep learning models. Previous research has explored various machine learning techniques for weather prediction; however, most struggle with the integration of data with heterogeneous modalities. To address these limitations, we propose an Adaptive Mixture of Experts (MoE) model tailored for precipitation rate prediction. Each expert within the model specializes in a specific modality or spatio-temporal pattern. We also incorporated a dynamic router that learns to assign inputs to the most relevant experts. Our results show that this modular design enhances predictive accuracy and interpretability. In addition to the modeling framework, we introduced an interactive web-based visualization tool that enables users to intuitively explore historical weather patterns over time and space. The tool was designed to support decision-making for stakeholders in climate-sensitive sectors. We evaluated our approach using a curated multimodal climate dataset capturing real-world conditions during Hurricane Ian in 2022. The benchmark results show that the Adaptive MoE significantly outperformed all the baselines.


Computational Fluid Dynamics Optimization of F1 Front Wing using Physics Informed Neural Networks

arXiv.org Artificial Intelligence

In response to recent FIA regulations reducing Formula 1 team wind tunnel hours (from 320 hours for last-place teams to 200 hours for championship leaders) and strict budget caps of 135 million USD per year, more efficient aerodynamic development tools are needed by teams. Conventional computational fluid dynamics (CFD) simulations, though offering high fidelity results, require large computational resources with typical simulation durations of 8-24 hours per configuration analysis. This article proposes a Physics-Informed Neural Network (PINN) for the fast prediction of Formula 1 front wing aerodynamic coefficients. The suggested methodology combines CFD simulation data from SimScale with first principles of fluid dynamics through a hybrid loss function that constrains both data fidelity and physical adherence based on Navier-Stokes equations. Training on force and moment data from 12 aerodynamic features, the PINN model records coefficient of determination (R-squared) values of 0.968 for drag coefficient and 0.981 for lift coefficient prediction while lowering computational time. The physics-informed framework guarantees that predictions remain adherent to fundamental aerodynamic principles, offering F1 teams an efficient tool for the fast exploration of design space within regulatory constraints.